Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[wip] rangefeed: improve memory accounting for event queue #120001

Closed
wants to merge 4 commits into from

Conversation

wenyihu6
Copy link
Contributor

@wenyihu6 wenyihu6 commented Mar 6, 2024

We can use the processor's resolved timestamp to predict if a checkpoint event will happen. But calling p.rts.ConsumeLogicalOp for memory accounting can change internal field unresolvedIntentQueue. To resolve this, we could make a deep copy of rts, but this is on a hot path and could introduce significant overhead.

Related: #73616
Resolves: #114190

Copy link

blathers-crl bot commented Mar 6, 2024

It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR?

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@wenyihu6 wenyihu6 force-pushed the memoryaccounting branch 6 times, most recently from 21cb347 to 963df91 Compare March 11, 2024 20:56
Copy link
Contributor

@erikgrinaker erikgrinaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, the overall approach here makes sense. Leaving some minor comments.

Reviewed 1 of 4 files at r3, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @wenyihu6)


pkg/kv/kvserver/rangefeed/event_size.go line 91 at r4 (raw file):

func checkpointRangeFeedEvent(span roachpb.RSpan, ts resolvedTimestamp) kvpb.RangeFeedEvent {
	var event kvpb.RangeFeedEvent

Will these cause memory allocations? We should avoid those on the hot path (we do a bunch of them already, but should try not to add any more). You can find out by writing a small benchmark using dev bench --bench-mem, or using scripts/goescape to see if any of these escape to the heap.


pkg/kv/kvserver/rangefeed/event_size.go line 110 at r4 (raw file):

func (ct ctEvent) futureMemUsage(span roachpb.RSpan, rts resolvedTimestamp) int64 {
	if rts.ForwardClosedTS(context.Background(), ct.Timestamp) {

We shouldn't call ForwardClosedTS here, nor rts.Init() for initRTS. These will actually advance the rangefeed's resolved timestamp, which will cause incorrect behavior when a later call e.g. during consumeEvent sees ForwardClosedTS return false because the resolved timestamp was already advanced when computing the event size.


pkg/kv/kvserver/rangefeed/event_size.go line 196 at r4 (raw file):

	switch re.GetValue().(type) {
	case *kvpb.RangeFeedValue:
		memUsage += rangefeedValueOverhead + int64(re.Size())

Size() won't really give an accurate representation of the memory size, as it computes the Protobuf-encoded wire size (which uses stuff like varints to compress integers, and omits fields with zero values).

I wonder if it would be sufficient here to simply compute the event memory usage, and assume that the Protobuf events have a size that's roughly equal to it. In other words, just use memUsage() and drop futureMemUsage(). How large is the diff usually?


pkg/kv/kvserver/rangefeed/scheduled_processor.go line 703 at r4 (raw file):

		if p.rts.ConsumeLogicalOp(ctx, op) {
			// May publish checkpoint we are over accounting here -> we should have a separate token for publishcheckpoint maybe
			p.publishCheckpoint(ctx, alloc)

I think this is fine, but we could also consider just passing a nil alloc here and ignoring these events. They're most likely few enough that they won't matter, and they should be small compared to the events that trigger them (that we've already allocated budget for).

@wenyihu6
Copy link
Contributor Author

pkg/kv/kvserver/rangefeed/event_size.go line 91 at r4 (raw file):

Previously, erikgrinaker (Erik Grinaker) wrote…

Will these cause memory allocations? We should avoid those on the hot path (we do a bunch of them already, but should try not to add any more). You can find out by writing a small benchmark using dev bench --bench-mem, or using scripts/goescape to see if any of these escape to the heap.

Good point. I think ts.Get() will always be the size of TimeStamp struct, but the size of span might need to be based on some evidence. I added a microbenchmark in this new PR #120347. I will compare it against the current calculateDataSize function later on.

@wenyihu6
Copy link
Contributor Author

pkg/kv/kvserver/rangefeed/event_size.go line 110 at r4 (raw file):

Previously, erikgrinaker (Erik Grinaker) wrote…

We shouldn't call ForwardClosedTS here, nor rts.Init() for initRTS. These will actually advance the rangefeed's resolved timestamp, which will cause incorrect behavior when a later call e.g. during consumeEvent sees ForwardClosedTS return false because the resolved timestamp was already advanced when computing the event size.

Yeah : ( I realized this when I did more testing. I changed this to always publish checkpoint event in #120347. Since we are possibly over-accounting now, I introduced another PR d303dae to adjust the memory before publish to avoid holding on over-accounted memory from being held for too long. We can discuss more in our meeting.

@wenyihu6
Copy link
Contributor Author

Closing in favour of: #120347

@wenyihu6 wenyihu6 closed this Mar 13, 2024
@wenyihu6 wenyihu6 deleted the memoryaccounting branch May 12, 2024 23:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

kvserver/rangefeed: checkpoint events can cause OOMs
3 participants